Dynamic Programming for Reduced NFAs for Approximate String and Sequence Matching
نویسنده
چکیده
We present a new simulation method for the reduced nondeterministic nite automata (NFAs) for the approximate string and sequence matching using the Levenshtein and generalized Levenshtein distances. These reduced NFAs are used in case that we are interested only in all occurrences of a pattern in an input text such that the edit distance between the pattern and the found strings is less or equal to a given k and we are not interested in the values of these edit distances. The presented simulation method is based on the dynamic programming.
منابع مشابه
An edit operation-based approach to approximate string matching in large DNA databases
In DNA related research, due to various environment conditions, mutations occur very often, where a mutation is defined as a heritable change in the DNA sequence. Therefore, approximate string matching is applied to answer those queries which find mutations. The problem of approximate string matching is that given a user specified parameter, k, we want to find where the substrings, which could ...
متن کاملApproximate string matching as an algebraic computation
Approximate string matching has a long history and employs a wide variety of methods (see e.g. the survey [2]). We consider a variant of approximate matching that compares a fixed pattern string to every substring in the text string by a rational-weighted edit distance (e.g. the indel distance, defined as the number of character insertions and deletions, or the indelsub/Levenshtein distance, wh...
متن کاملFast and Cache-Oblivious Dynamic Programming with Local Dependencies
String comparison such as sequence alignment, edit distance computation, longest common subsequence computation, and approximate string matching is a key task (and often computational bottleneck) in large-scale textual information retrieval. For instance, algorithms for sequence alignment are widely used in bioinformatics to compare DNA and protein sequences. These problems can all be solved us...
متن کاملSequence Alignment as a Database Technology Challenge
Sequence alignment is an important task for molecular biologists. Because alignment basically deals with approximate string matching on large biological sequence collections, it is both data intensive and computationally complex. There exist several tools for the variety of problems related to sequence alignment. Our first observation is that the term ’sequence database’ is used in general for ...
متن کاملLEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties
Motivation: Approximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded SmithWaterman algorithm but suffers from support of a limited selection of scoring schemes. In this p...
متن کامل